专利摘要:
A linear method for performing head motion estimation from facial feature data is provided, the method comprising: obtaining a first facial image and detecting a head in the first image; Detecting the location of the four points P of the first face image, wherein P = {p 1 , p 2 , p 3 , p 4 } and p k = (x k , y k ) Position detection step; Obtaining a second facial image and detecting a head in the second image; Detecting the position of the four points P 'of the second face image, wherein P' = {p ' 1 , p' 2 , p ' 3 , p' 4 } and p ' k = (x' k , y ' k ), detecting the position of P'; Determining the movement of the head represented by the rotation matrix R and the translational motion vector T using the points P and P '. The head motion estimation is determined according to equation (formula (I)), where formula (II) represents camera rotation and translational motion, respectively. Formula (Ⅰ) P i '= RP i + T Formula (Ⅱ) And T = [T 1 T 2 T 3 ] T
公开号:KR20040037152A
申请号:KR10-2004-7004460
申请日:2002-09-10
公开日:2004-05-04
发明作者:트라지코빅미로슬라프;필로민바싼스;구타스리니바스브이.알.
申请人:코닌클리케 필립스 일렉트로닉스 엔.브이.;
IPC主号:
专利说明:

Head motion estimation from four feature points
[2] Head pose recognition is an important research area in human computer interaction, and many approaches to head pose recognition have been proposed. Most of these approaches model a face with arbitrary facial features. For example, most existing approaches use six facial features, including pupils, nostrils, and lip corners, used to model the face, while in Proc , Austin, December 2000. . Workshop on Human Motion , pp. Other approaches, published in references by Z. Liu and Z. Zhang, entitled 73-80, “Robust Head Motion Computation by Taking Advantage of Physical Properties,” include five approaches including the tail of the nose, the tail of the nose, and the tip of the nose. It is implemented with facial feature points. For Zhang, head motion is estimated from five feature points through nonlinear optimization. Indeed, existing algorithms for face pose estimation are nonlinear.
[3] It is more desirable to provide a face pose estimation algorithm that is linear and has less computational requirements than nonlinear solutions.
[4] It is also more desirable to provide a face pose estimation algorithm that is linear and relies only on four feature points, such as the tail of the eye and the tail of the mouth.
[1] The present invention relates to systems and methods for calculating head motion estimation at facial image locations, eg, eye corner and mouth corner, in particular head motion estimation using four facial feature points. It relates to a linear method of doing this. According to a particular case, an algorithm for head pose estimation from four feature points is additionally described.
[11] 1 is a diagram showing the configuration of the typical feature points of a typical head.
[12] 2 is a face geometry 10 that provides the basis for a head pose estimation algorithm of the present invention.
[5] It is therefore an object of the present invention to provide a head motion estimation algorithm that is a linear solution.
[6] Another object of the present invention is to provide a head motion estimation algorithm that is linear and uses four facial feature points.
[7] It is yet another object of the present invention to provide a head pose estimation algorithm that depends on the head motion estimation algorithm.
[8] According to the principles of the present invention, a linear method is provided for performing head motion estimation from facial feature data, the method comprising: obtaining a first facial image and detecting a head in the first image; Detecting the location of the four points P of the first face image, wherein P = {p 1 , p 2 , p 3 , p 4 } and p k = (x k , y k ) Position detection step; Obtaining a second facial image and detecting a head in the second image; Detecting the position of the four points P 'of the second face image, wherein P' = {p ' 1 , p' 2 , p ' 3 , p' 4 } and p ' k = (x' k , y ' k ), detecting the position of P'; Determining the movement of the head represented by the rotation matrix R and the translational motion vector T using the points P and P '. The head motion estimation is determined by the following equation P i '= RP i + T, where And T = [T 1 T 2 T 3 ] T represent camera rotation and translation motion, respectively, and head pose estimation is a specific example of head motion estimation.
[9] Advantageously, the head pose estimation algorithm from the four feature points can be used in avatar control applications, video chat, and face recognition applications.
[10] The details of the invention described herein will be described with reference to the drawings described below.
[13] In accordance with the principles of the present invention, a linear method is provided for the calculation of head motion estimation from the image locations of the tail and mouth. In particular, the method is provided with estimating head movement from four point matches, with head pose estimation being a particular case when the frontal image is used as the reference position.
[14] This method is superior to other non-linear methods that require many point matches (at least seven) or require at least five facial feature matches.
[15] In general, the method for head motion estimation is as follows. That is, the first step is to obtain a first image I 1 , and to detect the head at I 1 . Then there are detected points P, i.e., P = {p 1 , p 2 , p 3 , p 4 } corresponding to the tail and mouth tail at I 1 , where p k = (x k , y k ) is the image of the point Represents a coordinate. The following second image I 2 is obtained, the head is detected by the I 2. Then at I 2 there are detected points P 'corresponding to the tail and mouth tail, P' = {p ' 1 , p' 2 , p ' 3 , p' 4 }, where p ' k = (x' k , y ' k ). From P and P ', the next step includes determining the movement of the head represented by the rotation matrix R and the translational motion vector T. Once the motion parameters R and T are calculated, it can be seen that the 3D structure of all the point matches can be calculated. However, the structure and translation can only be determined up to a certain scale, so if the size of T is fixed, the structure is uniquely determined. If the depth of a point in 3D is fixed, T will be uniquely determined.
[16] As mentioned above, the algorithm for head pose estimation is a specific case of the head motion estimation algorithm, and there are two ways in which this can be achieved. That is, 1) an interactive method requiring a reference image; 2) Approximate method using a general (average biological measurement) head shape, also called a general head model (GHM).
[17] For the interaction algorithm, the next step is executed. 1) Before using the system, the user is asked to face the camera at a predetermined reference position. Reference tail and mouth tail P 0 are obtained as described in the above steps. 2) When a new image is obtained, the tail of the eye and the tail of the mouth are detected, and the head movement is estimated as in the remaining steps shown in the algorithm. 3) The head rotation matrix corresponds to the head pose matrix.
[18] The approximation algorithm does not require any interaction with the user, but assumes that any biological measurement information is available and fixed for all of the users. For example, as shown in FIG. 1, an approximation algorithm is shown that includes a structure of formal feature points for a formal head 19 with respect to the camera coordinate system 20, designated system C xyz . In FIG. 1, the points P 1 and P 3 represent the tail and mouth tail of the general head model 19, respectively. With respect to the front face shown in FIG. 1, it can be seen that these points P 1 and P 3 have different depths (Z 1 and Z 3 , respectively). The angle τ is known and assumes that the mean is used for everyone's heads. This is not an exact value, but the pitch angle is very difficult to calculate correctly, because even the same person may bend the head differently in repeated experiments when asked to look straight at the camera. For a fixed angle τ, the head pose can only be determined with one image of the head, described in more detail below.
[19] For illustrative purposes, assume that the camera or digital image capture device obtains two images from the model head at different locations. The points p 1 , p 2 , p 3 , p 4 represent the image coordinates of the eye tail (points p 1 , p 2 ) and the mouth tail (points p 3 , p 4 ) in the first image, and the points p ' 1 , p' 2 Let p ' 3 and p' 4 represent the corresponding eye and mouth tail coordinates in the second image. Given these feature coordinates, the task will determine head movement (represented by rotational and translational movements) between the first and second two images.
[20] In general, the algorithm is performed in the following steps. That is, 1) using facial constraints, calculating three-dimensional (3D) coordinates for the feature points from the two images, and 2) the 3D positions of the feature points are provided, the motion parameters (rotation R and translational motion). Calculating m matrices of T).
[21] The step of calculating the 3D coordinates of the feature points according to the algorithm is now described. As shown in face shape 10 shown in FIG. 2, the features of points p 1 , p 2 , p 3 , p 4 and p ' 1 , p' 2 , p ' 3 , p' 4 are defined by In the two images, 3D coordinates of each eye and mouth tail are shown. In the face shape shown in Fig. 2, the following characteristics are assumed. 1) the line segment 12 connecting the points p 1 p 2 is parallel to the line segment 15 connecting the points p 3 p 4 , ie p 1 p 2 | p 3 p 4 and 2) the line segment 12 connecting the points p 1 p 2 is orthogonal to the line segment 15 connecting the points p 5 p 6 (where p 5 and p 6 are the segments p 1 p 2 and p 3 is the midpoint of p 4 ). Numerically, these properties 1 and 2 can be written according to equations (1) and (2), respectively, as follows.
[22]
[23]
[24] Where P i = [X i Y i Z i ] T represents the 3D coordinates of the image point p i . The rotation between the image and the three-dimensional (3D) coordinates of an arbitrary point P k is given by the well-known perspective equation as follows.
[25]
[26] Since it is well known that structural reconstruction from monocular image sequences can only be performed to a certain ratio, one of the Z coordinates is fixed and the remaining coordinates are calculated with reference to this one Z coordinate. Therefore, in order to simplify the calculation and to avoid general losses, we assume Z 1 = 1. By multiplying each molecule of the fraction by the denominator of another fraction, and substituting equation (3) by equation (1), the following relations obtained are explained by equations (4) and (5) .
[27]
[28]
[29] When equations (4) and (5) are described in matrix form, equation (6) is obtained.
[30]
[31] Only if the determinant in equation (7) is equal to 0, i.e.
[32]
[33] Only if will this formula have an unambiguous solution for Z 3 and Z 4 . Equivalently, equation (7) can be described as following equation (8).
[34]
[35] Equation (8) is a quadratic polynomial and has two solutions. It is easy to prove that there is one apparent solution, Z 2 = 1 (eg by substitution in equation (7)), and the second solution is obtained as follows.
[36]
[37] By substituting Z 2 for any one of formulas (4) and (5), one linear equation in Z 3 and Z 4 is obtained. Another equation is obtained by substituting Eq. (3) into Eq. (2), and consists of the form
[38]
[39] Where p hi = [x i y i 1] T. Z 3 and Z 4 can now be obtained from equations (10) and (4).
[40] As is known, the movement of the head points can be expressed according to equation (11).
[41]
[42] here And T = [T 1 T 2 T 3 ] T represent camera rotation and translation motion, respectively. Equation (11) can now be written with respect to R and T.
[43]
[44] From equation (12) it is observed that each pair of points yields three equations. Since the total number is unknown 12, at least four point pairs are needed to solve linearly for rotational and translational motion.
[45] The elements of the matrix R are not independent (ie RR T = I), so once the matrix R is found, it needs to be corrected to represent the correct rotation matrix. This can be done by decomposing R into form R = USV T using single value decomposition (SVD) and calculating a new rotation matrix according to equation (13)
[46]
[47] As is known, the “head pose” can be uniquely represented as a set of three angles (yaw, roll angle and pitch angle) or as the rotation matrix R (where there is a one-to-one correspondence between the rotation matrix and the pose angles). Although the head pose estimate of the interaction is equivalent to the head movement estimate, an approximate head pose estimate is described that can be simplified by breaking it up into two steps. The two steps are: That is, 1) assuming the user tilts the user's head so that the eye and mouth tail is at the same distance from the camera (z 1 = z 2 = z 3 = z 4 ), and 2) the ARP Compute the head pose for the step, and 3) update the pitch angle by simply subtracting τ from the value in the ARP.
[48] Condition, RR T = I, or equivalent
[49]
[50] The rotation matrix R that satisfies can be written as
[51]
[52] Suppose that F 1, F 2, F 3, F 4 denote the 3D coordinates of the crow's feet and ipkkori face of the front, which is the standard. Then, considering the shape constraints of the face and the above constraint 1), the relations determined by equation (15) are obtained as follows.
[53]
[54] The symbol 기호 here means "equal to or equal to a certain ratio". The aim achieved by the present invention is to find a pose matrix R that maps the points P k to F k . In other words,
[55]
[56] Regarding the columns of the rotation matrix, equation (16) can be written as follows.
[57]
[58] From the second and fourth equation in equation (17), r 3 can be calculated as follows.
[59]
[60] The remaining components of the rotation matrix can be calculated as follows in equations (14) and (17).
[61]
[62] In equation (19), it is simple to calculate the yaw angle, roll angle, and pitch angle. The exact pitch angle is then obtained by subtracting τ from its current value.
[63] While shown and described in light of the preferred embodiments of the present invention, it will of course be understood that various modifications and changes in form or detail may be made readily without departing from the spirit of the invention. Therefore, it is intended that the invention not be limited to the precise forms described and shown, but be intended to cover all modifications that may fall within the scope of the appended claims.
权利要求:
Claims (17)
[1" claim-type="Currently amended] A linear method for performing head motion estimation from facial feature data, the method comprising:
Obtaining a first facial image and detecting a head in the first image;
Detecting the location of the four points P of the first face image, wherein P = {p 1 , p 2 , p 3 , p 4 } and p k = (x k , y k ) Position detection step;
Obtaining a second facial image and detecting a head in the second image;
Detecting the position of the four points P 'of the second face image, wherein P' = {p ' 1 , p' 2 , p ' 3 , p' 4 } and p ' k = (x' k , y ' k ), detecting the position of P';
Determining the motion of the head represented by the rotation matrix R and the translational motion vector T using the points P and P '.
[2" claim-type="Currently amended] The method of claim 1, wherein the four points P of the first face image 10 and the four points P ′ of the second face image are respectively the tails of the eyes and the mouth of each of the first and second face images. A linear method comprising locations.
[3" claim-type="Currently amended] The method of claim 1, wherein the head motion estimation is determined according to the following equation P i '= RP i + T, wherein And T = [T 1 T 2 T 3 ] T represent camera rotation and translation motion, respectively, and head pose estimation is a specific example of head motion estimation.
[4" claim-type="Currently amended] 4. The method of claim 3, wherein the head motion estimation is determined according to the rotation matrix R, the method further comprising determining a rotation matrix R that maps points P k to F k to characterize the head pose. The points F 1 , F 2 , F 3 , and F 4 represent three-dimensional (3D) coordinates of each of the four points of the face image of the front face as a reference, and P k represents three-dimensional ( 3D) coordinates, where P i = [X i Y i Z i ] T and the mapping is determined according to

Where P 5 and P 6 are the midpoints of the line segment 12 connecting the points P 1 P 2 and the line segment 15 connecting the points P 3 P 4, and the line connecting the points P 1 P 2 is the point connecting the points P 5 P 6 . A linear method that is orthogonal to a line segment and ∝ represents a proportional factor.
[5" claim-type="Currently amended] The method of claim 4, wherein the components r 1 , r 2 , r 3 are as follows.

Calculated, linear way.
[6" claim-type="Currently amended] The method of claim 5, wherein the components r 1 , r 2 , r 3 are as follows.

Calculated, linear way.
[7" claim-type="Currently amended] The method of claim 4, wherein

Each point pair yields three equations, whereby at least four point pairs are needed to solve linearly for the rotational and translational motion.
[8" claim-type="Currently amended] 8. The linear method of claim 7, further comprising decomposing the rotation matrix R using single value decomposition (SVD) to obtain form R = USV T.
[9" claim-type="Currently amended] 8. The linear method of claim 7, further comprising calculating a new rotation matrix according to R = UV T.
[10" claim-type="Currently amended] A linear method for performing head motion estimation from facial feature data, the method comprising:
Obtaining an image position of four points P k of the face image 10;
In order to characterize the head pose, determining a rotation matrix R that maps points P k to F k , wherein the points F 1 , F 2 , F 3 , F 4 are reference Represent three-dimensional (3D) coordinates of each of the four points in the image, P k are three-dimensional (3D) coordinates of any point, where P i = [X i Y i Z i ] T , where the mapping is Determined by the relationship,

Where P 5 and P 6 are the midpoints of the line segment 12 connecting the points P 1 P 2 and the line segment 15 connecting the points P 3 P 4, and the line connecting the points P 1 P 2 is the point connecting the points P 5 P 6 . A linear method that is orthogonal to a line segment and ∝ represents a proportional factor.
[11" claim-type="Currently amended] The method of claim 10, wherein the components r 1 , r 2 , r 3 are as follows.

Calculated, linear way.
[12" claim-type="Currently amended] The method of claim 11, wherein the components r 1 , r 2 , r 3 are as follows.

Calculated, linear way.
[13" claim-type="Currently amended] The method of claim 12, wherein the movement of the head points is indicated according to P i '= RP i + T, wherein Is an image rotation, T = [T 1 T 2 T 3 ] T represents a translational movement, and P i 'represents the 3D image position of four points P k of another face image.
[14" claim-type="Currently amended] The method of claim 13,

Each point pair yields three equations, whereby at least four point pairs are needed to solve linearly for the rotational and translational motion.
[15" claim-type="Currently amended] 15. The linear method of claim 14, further comprising decomposing the rotation matrix R using single value decomposition (SVD) to obtain form R = USV T.
[16" claim-type="Currently amended] The linear method of claim 15, further comprising calculating a new rotation matrix according to R = UV T.
[17" claim-type="Currently amended] A program storage device that can be read by a machine that explicitly implements a program of instructions that can be executed by a machine to perform method steps for performing head motion estimation from facial feature data.
The method,
Obtaining a first facial image and detecting the head 10 in the first image;
Detecting the location of the four points P of the first face image, wherein P = {p 1 , p 2 , p 3 , p 4 } and p k = (x k , y k ) Position detection step;
Obtaining a second facial image and detecting a head in the second image;
Detecting the position of the four points P 'of the second face image, wherein P' = {p ' 1 , p' 2 , p ' 3 , p' 4 } and p ' k = (x' k , y ' k ), detecting the position of P';
Determining the motion of the head represented by the rotation matrix R and the translational motion vector T using the points P and P '.
类似技术:
公开号 | 公开日 | 专利标题
Ganapathi et al.2012|Real-time human pose tracking from range data
US9361723B2|2016-06-07|Method for real-time face animation based on single video camera
CN104395932B|2017-04-26|Method for registering data
US10049277B2|2018-08-14|Method and apparatus for tracking object, and method and apparatus for calculating object pose information
US10033985B2|2018-07-24|Camera pose estimation apparatus and method for augmented reality imaging
Plankers et al.2003|Articulated soft objects for multiview shape and motion capture
Smolyanskiy et al.2014|Real-time 3D face tracking based on active appearance model constrained by depth data
Azarbayejani et al.1993|Visually controlled graphics
Negahdaripour1998|Revised definition of optical flow: Integration of radiometric and geometric cues for dynamic scene analysis
Decarlo et al.2000|Optical flow constraints on deformable models with applications to face tracking
JP6681729B2|2020-04-15|Method for determining 3D pose of object and 3D location of landmark point of object, and system for determining 3D pose of object and 3D location of landmark of object
US7133540B2|2006-11-07|Rapid computer modeling of faces for animation
Azuma et al.1994|Improving static and dynamic registration in an optical see-through HMD
Del Bue et al.2006|Non-rigid metric shape and motion recovery from uncalibrated images using priors
Sandini et al.1990|Active tracking strategy for monocular depth inference over multiple frames
EP1870038B1|2009-05-13|Motion capture apparatus and method, and motion capture program
US9058661B2|2015-06-16|Method for the real-time-capable, computer-assisted analysis of an image sequence containing a variable pose
Darrell et al.1995|Cooperative robust estimation using layers of support
JP5647155B2|2014-12-24|Body feature detection and human pose estimation using inner distance shape relation
US6580810B1|2003-06-17|Method of image processing using three facial feature points in three-dimensional head motion tracking
Wang et al.2007|EM enhancement of 3D head pose estimated by point at infinity
EP1677250B1|2012-07-25|Image collation system and image collation method
La Cascia et al.2000|Fast, reliable head tracking under varying illumination: An approach based on registration of texture-mapped 3D models
JP4202479B2|2008-12-24|3D motion restoration system
JP4593968B2|2010-12-08|Position and orientation measurement method and apparatus
同族专利:
公开号 | 公开日
CN1316416C|2007-05-16|
US7027618B2|2006-04-11|
EP1433119A1|2004-06-30|
WO2003030086A1|2003-04-10|
US20030063777A1|2003-04-03|
DE60217143T2|2007-10-04|
AT349738T|2007-01-15|
JP2005505063A|2005-02-17|
EP1433119B1|2006-12-27|
CN1561499A|2005-01-05|
DE60217143D1|2007-02-08|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题
法律状态:
2001-09-28|Priority to US09/966,410
2001-09-28|Priority to US09/966,410
2002-09-10|Application filed by 코닌클리케 필립스 일렉트로닉스 엔.브이.
2002-09-10|Priority to PCT/IB2002/003713
2004-05-04|Publication of KR20040037152A
优先权:
申请号 | 申请日 | 专利标题
US09/966,410|2001-09-28|
US09/966,410|US7027618B2|2001-09-28|2001-09-28|Head motion estimation from four feature points|
PCT/IB2002/003713|WO2003030086A1|2001-09-28|2002-09-10|Head motion estimation from four feature points|
[返回顶部]